Fast and Scalable Startup of MPI Programs in InfiniBand Clusters
نویسندگان
چکیده
One of the major challenges in parallel computing over large scale clusters is fast and scalable process startup, which typically can be divided into two phases: process initiation and connection setup. In this paper, we characterize the startup of MPI programs in InfiniBand clusters and identify two startup scalability issues: serialized process initiation in the initiation phase and high communication overhead in the connection setup phase. To reduce the connection setup time, we have developed one approach with data reassembly to reduce data volume, and another with a bootstrap channel to parallelize the communication. Furthermore, a process management framework, Multi-Purpose Daemons (MPD) system is exploited to speed up process initiation. Our experimental results show that job startup time has been improved by more than 4 times for 128-process jobs, and the improvement can be more than two orders of magnitude for 2048process jobs as suggested by our analytical models.
منابع مشابه
Scalable High Performance Message Passing over InfiniBand for Open MPI
InfiniBand (IB) is a popular network technology for modern high-performance computing systems. MPI implementations traditionally support IB using a reliable, connection-oriented (RC) transport. However, per-process resource usage that grows linearly with the number of processes, makes this approach prohibitive for large-scale systems. IB provides an alternative in the form of a connectionless u...
متن کاملA Scalable InfiniBand Network Topology-Aware Performance Analysis Tool for MPI
Over the last decade, InfiniBand (IB) has become an increasingly popular interconnect for deploying modern supercomputing systems. As supercomputing systems grow in size and scale, the impact of IB network topology on the performance of high performance computing (HPC) applications also increase. Depending on the kind of network (FAT Tree, Tori, or Mesh), the number of network hops involved in ...
متن کاملUsing InfiniBand for a scalable compute infrastructure
............................................................................................................................................. 2 Introduction ......................................................................................................................................... 2 InfiniBand technology .................................................................................
متن کاملScalable and High Performance Collective Communication for next Generation Multicore Infiniband Clusters
High Performance Computing is enabling rapid innovations spanning several key areas ranging from science, technology and manufacturing disciplines to entertainment and financial markets. One computing paradigm contributing significantly to the outreach of such capabilities is Cluster Computing. Cluster computing involves the use of multiple Commodity PCs interconnected by a network to provide t...
متن کاملA Scalable Process-Management Environment for Parallel Programs
We present a process management system for parallel programs such as those written using MPI. A primary goal of the system, which we call MPD (for multipurpose daemon), is to be scalable. By this we mean that startup of interactive parallel jobs comprising a thousand processes is quick, that signals can be quickly delivered to processes, and that stdin, stdout, and stderr are managed intuitivel...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004